JOIN
條件範圍時,執行計畫預估值容易不準確,這也間接導致查詢效能不好.
就算有建立Index也會遇到上述問題
假如我們想要提升JOIN
條件範圍效能並讓Index可以發揮最大最用可以怎麼做?
就讓我利用一個範例來跟大家分享.
本文同步發表Blog JOIN範圍條件Index優化
此範例有使用到三張表
CREATE TABLE [dbo].[Product](
[ProductId] INT NOT NULL
)
CREATE TABLE [dbo].[T99](
[TransactionId] [int] IDENTITY(1,1) NOT NULL,
[Amount] DECIMAL(18,6),
[CreateDate] [datetime2](3) NULL
)
GO
CREATE TABLE [dbo].[ReportPeriod](
[PerioidID] [int] IDENTITY(1,1) NOT NULL,
[ProductId] INT NOT NULL,
[IsGenerate] [bit] NULL,
[StartDate] [datetime2](3) NULL,
[EndDate] [datetime2](3) NULL
) ON [PRIMARY]
GO
我們利用
T99.CreateDate
來跟ReportPeriod
判斷是屬於哪期報表.
我們利用亂數產生Sample資料來模擬線上大資料狀況.
ReportPeriod
期別由'2019-08-01'
到'2020-07-31'
因為產生報表以5分鐘為區間,所以可以利用CTE遞迴來幫我們產生資料.
INSERT INTO [dbo].[Product] VALUES (1);
INSERT INTO [dbo].[Product] VALUES (2);
INSERT INTO [dbo].[Product] VALUES (3);
INSERT INTO [dbo].[Product] VALUES (4);
INSERT INTO [dbo].[Product] VALUES (5);
INSERT INTO [dbo].[Product] VALUES (6);
INSERT INTO [dbo].[Product] VALUES (7);
INSERT INTO [dbo].[Product] VALUES (8);
INSERT INTO [dbo].[Product] VALUES (9);
INSERT INTO [dbo].[Product] VALUES (10);
declare @FromDate DATETIME2(3) = '2019-08-01'
declare @ToDate DATETIME2(3) = '2020-07-31'
;WITH CTE AS (
SELECT @FromDate fromDt,@ToDate endDt
UNION ALL
SELECT DATEADD(MINUTE,5,fromDt),endDt
FROM CTE
WHERE DATEADD(MINUTE,5,fromDt) < endDt
)
INSERT INTO [dbo].[ReportPeriod] ([ProductId],[IsGenerate],[StartDate],[EndDate])
SELECT ProductId,0,fromDt,DATEADD(MINUTE,5,fromDt)
FROM CTE CROSS JOIN dbo.Product
OPTION (MAXRECURSION 0);
INSERT INTO T99 ([CreateDate],Amount)
SELECT top 1000000 dateadd(SECOND,
rand(checksum(newid()))*(1+datediff(SECOND, @FromDate, @ToDate)),
@FromDate),
CAST(RAND(CHECKSUM(NEWID())) * 100000 as INT) + 1
FROM sys.all_columns c1 CROSS JOIN sys.all_columns c2
CREATE CLUSTERED INDEX [CIX_ReportPeriod_StartDate] ON [dbo].[ReportPeriod]
(
[StartDate] ASC,
[EndDate] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
ALTER TABLE [dbo].[ReportPeriod] ADD CONSTRAINT [PK_ReportPeriod] PRIMARY KEY NONCLUSTERED
(
[PerioidID] ASC,
[ProductId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
CREATE CLUSTERED INDEX [CIX_T99_CreateDate] ON [dbo].[T99]
(
[CreateDate] ASC
)
CREATE UNIQUE NONCLUSTERED INDEX [IX_T99_TransactionId] ON [dbo].[T99]
(
[TransactionId] ASC
)
在線上我們會使用UDT當作參數來取得某些期別資訊.
這裡為了方便模擬我使用Table Variable來取代.
set nocount on
DECLARE @Transaction AS TABLE(
TransactionId INT,
ProductId INT
);
INSERT INTO @Transaction VALUES (1,1)
INSERT INTO @Transaction VALUES (101,2)
INSERT INTO @Transaction VALUES (1001,3)
SELECT p.*
FROM dbo.T99 t
JOIN [dbo].[ReportPeriod] p ON t.CreateDate BETWEEN p.StartDate AND p.EndDate
JOIN @Transaction t1 ON t.TransactionId = t1.TransactionId AND p.ProductId = t1.ProductId
使用上面語法我們只需查詢三個期別資料,但看執行計畫時能發現,ReportPeriod使用的Clustered預估資訊有1百多萬筆
Q:我明明有對於條件建立Index,但為什麼預估值卻會跑真那麼嚴重?
CREATE CLUSTERED INDEX [CIX_ReportPeriod_StartDate] ON [dbo].[ReportPeriod]
(
[StartDate] ASC,
[EndDate] ASC
)
原因出在範圍條件會因為查找範圍過大導致預估值不準確
甚麼意思? 讓我們看看下圖(代表ReportPeriod
內含日期資料)
而我們在JOIN
條件只有t.CreateDate BETWEEN p.StartDate AND p.EndDate
這就會導致,我們需要查找ReportPeriod
日期資料在挑出符合的資料
JOIN [dbo].[ReportPeriod] p ON t.CreateDate BETWEEN p.StartDate AND p.EndDate
JOIN @Transaction t1 ON t.TransactionId = t1.TransactionId AND p.ProductId = t1.ProductId
最後就會看到走針的估計值
效能差問題,選擇對Index和撰寫合理的查詢可以改善40%左右問題
我們思考一下如果可以把範圍條件改成精準=
查找條件不就可以更精準預估資訊了?
t.CreateDate BETWEEN p.StartDate AND p.EndDate
那我們怎麼把上面條件使用=
取代BETWEEN
範圍查詢呢?
這時我們可以利用空間來換取時間
建立一個新的COLUMN
運用算法來計算每個期數StartTime
例如:CreateDate = 2020/01/03 10:08:55
會歸類在2020/01/03 10:05:00
中
ALTER TABLE dbo.T99 ADD PeriodDate AS DATEADD(MINUTE,DATEPART(MINUTE,CreateDate) %5 * -1,
DATETIMEFROMPARTS(
DATEPART(YEAR,CreateDate),
DATEPART(MONTH,CreateDate),
DATEPART(DAY,CreateDate),
DATEPART(HOUR,CreateDate),
DATEPART(MINUTE,CreateDate),0,0)
)
建立完新COLUMN
後別忘記加入一個Index
給此COLUMN
.
CREATE INDEX IX_PeriodDate_T99 ON dbo.T99(
PeriodDate
)
最後我們修改一下查詢語法
SELECT p.*
FROM dbo.T99 t
JOIN [dbo].[ReportPeriod] p ON p.StartDate = t.PeriodDate
JOIN @Transaction t1 ON t.TransactionId = t1.TransactionId AND p.ProductId = t1.ProductId
預估值和讀取值已經可以大幅降低了!!
在JOIN
範圍條件差效能問題,可以思考一下是否有辦法利用算法或是公式來優化查詢效能,如此次範例一樣.